Search CORE

609 research outputs found

The Role of Vocabulary Mediation to Discover and Represent Relevant Information in Privacy Policies

Author: Di Caro Luigi
Leone Valentina
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

To date, the effort made by existing vocabularies to provide a shared representation of the data protection domain is not fully exploited. Different natural language processing (NLP) techniques have been applied to the text of privacy policies without, however, taking advantage of existing vocabularies to provide those documents with a shared semantic superstructure. In this paper we show how a recently released domain-specific vocabulary, i.e. the Data Privacy Vocabulary (DPV), can be used to discover, in privacy policies, the information that is relevant with respect to the concepts modelled in the vocabulary itself. We also provide a machine-readable representation of this information to bridge the unstructured textual information to the formal taxonomy modelled in it. This is the first approach to the automatic processing of privacy policies that relies on the DPV, fuelling further investigation on the applicability of existing semantic resources to promote the reuse of information and the interoperability between systems in the data protection domain

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Open Access Repository

Mining Meaning from Text by Harvesting Frequent and Diverse Semantic Itemsets

Author: Boella Guido
Di Caro Luigi
Publication venue: CEUR-WS.org
Publication date: 01/01/2014
Field of study

Abstract. In this paper, we present a novel and completely-unsupervised approach to unravel meanings (or senses) from linguistic constructions found in large corpora by introducing the concept of semantic vector. A semantic vector is a space-transformed vector where features repre-sent fine-grained semantic information units, instead of values of co-occurrences within a collection of texts. More in detail, instead of seeing words as vectors of frequency values, we propose to first explode words into a multitude of tiny semantic information retrieved from existing re-sources like WordNet and ConceptNet, and then clustering them into frequent and diverse patterns. This way, on the one hand, we are able to model linguistic data with a larger but much more dense and informa-tive semantic feature space. On the other hand, being the model based on basic and conceptual information, we are also able to generate new data by querying the above-mentioned semantic resources with the fea-tures contained in the extracted patterns. We experimented the idea on a dataset of 640 millions of triples subject-verb-object to automatically inducing senses for specific input verbs, demonstrating the validity and the potential of the presented approach in modeling and understanding natural language

CiteSeerX

Institutional Research Information System University of Turin

Frequent Use Cases Extraction from Legal Texts in the Data Protection Domain

Author: Di Caro Luigi
Leone Valentina
Publication venue: 'IOS Press'
Publication date: 01/01/2019
Field of study

Because of the recent entry into force of the General Data Protection Regulation (GDPR), a growing of documents issued by the European Union institutions and authorities often mention and discuss various use cases to be handled to comply with GDPR principles. This contribution addresses the problem of extracting recurrent use cases from legal documents belonging to the data protection domain by exploiting existing Ontology Design Patterns (ODPs). An analysis of ODPs that could be looked for inside data protection related documents is provided. Moreover, a first insight on how Natural Language Processing techniques could be exploited to identify recurrent ODPs from legal texts is presented. Thus, the proposed approach aims to identify standard use cases in the data protection field at EU level to promote the reuse of existing formalisations of knowledge

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Institutional Research Information System University of Turin

Supervised Learning of Syntactic Contexts for Uncovering Definitions and Extracting Hypernym Relations in Text Databases

Author: Boella Guido
Di Caro Luigi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Institutional Research Information System University of Turin

A text similarity approach for automated transposition detection of European Union directives

Author: Boella Guido
Di Caro Luigi
Nanda Rohan
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

Maastricht University Research Portal

Institutional Research Information System University of Turin

Ranking researchers through collaboration pattern analysis

Author: Cataldi Mario
Di Caro Luigi
Schifanella Claudio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Institutional Research Information System University of Turin

Text segmentation with topic modeling and entity coherence

Author: Boella Guido
Di Caro Luigi
John Adebayo Kolawole
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Institutional Research Information System University of Turin

Neural reasoning for legal text understanding

Author: Adebayo Kolawole John
Boella Guido
Di Caro Luigi
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

Institutional Research Information System University of Turin

It is not what but who you know: A time-sensitive collaboration impact measure of researchers in surrounding communities

Author: Cataldi Mario
Di Caro Luigi
Lamolle Myriam
Schifanella Claudio
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Crossref

Institutional Research Information System University of Turin

A Bimodal Network Approach to Model Topic Dynamics

Author: Di Caro Luigi
Guerzoni Marco
Nuccio Massimiliano
Siragusa Giovanni
Publication venue
Publication date: 01/01/2017
Field of study

This paper presents an intertemporal bimodal network to analyze the evolution of the semantic content of a scientific field within the framework of topic modeling, namely using the Latent Dirichlet Allocation (LDA). The main contribution is the conceptualization of the topic dynamics and its formalization and codification into an algorithm. To benchmark the effectiveness of this approach, we propose three indexes which track the transformation of topics over time, their rate of birth and death, and the novelty of their content. Applying the LDA, we test the algorithm both on a controlled experiment and on a corpus of several thousands of scientific papers over a period of more than 100 years which account for the history of the economic thought

arXiv.org e-Print Archive

University of Birmingham Research Portal

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Leiden University Scholary Publications